-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Python: Fix syntax error when = is used as a format fill character
#21274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
An example (provided by @redsun82) is the string `f"{x:=^20}"`. Parsing this (with unnamed nodes shown) illustrates the problem: ``` module [0, 0] - [2, 0] expression_statement [0, 0] - [0, 11] string [0, 0] - [0, 11] string_start [0, 0] - [0, 2] interpolation [0, 2] - [0, 10] "{" [0, 2] - [0, 3] expression: named_expression [0, 3] - [0, 9] name: identifier [0, 3] - [0, 4] ":=" [0, 4] - [0, 6] ERROR [0, 6] - [0, 7] "^" [0, 6] - [0, 7] value: integer [0, 7] - [0, 9] "}" [0, 9] - [0, 10] string_end [0, 10] - [0, 11] ``` Observe that we've managed to combine the format specifier token `:` and the fill character `=` in a single token (which doesn't match the `:` we expect in the grammar rule), and hence we get a syntax error. If we change the `=` to some other character (e.g. a `-`), we instead get ``` module [0, 0] - [2, 0] expression_statement [0, 0] - [0, 11] string [0, 0] - [0, 11] string_start [0, 0] - [0, 2] interpolation [0, 2] - [0, 10] "{" [0, 2] - [0, 3] expression: identifier [0, 3] - [0, 4] format_specifier: format_specifier [0, 4] - [0, 9] ":" [0, 4] - [0, 5] "}" [0, 9] - [0, 10] string_end [0, 10] - [0, 11] ``` and in particular no syntax error. To fix this, we want to ensure that the `:` is lexed on its own, and the `token(prec(1, ...))` construction can be used to do exactly this. Finally, you may wonder why `=` is special here. I think what's going on is that the lexer knows that `:=` is a token on its own (because it's used in the walrus operator), and so it greedily consumes the following `=` with this in mind.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This pull request fixes a syntax error that occurred when using = as a fill character in f-string format specifiers (e.g., f"{x:=^20}"). The issue was caused by the lexer greedily consuming := as a single token (the walrus operator) instead of lexing : separately followed by =.
Changes:
- Modified the grammar to use
token(prec(1, ':'))in format specifiers to ensure:is lexed independently - Added test cases for the fixed behavior in both
strings.pyandtemplate_strings_new.py - Regenerated tree-sitter parser artifacts (grammar.json, node-types.json, parser.h, array.h)
- Bumped extractor version from 7.1.7 to 7.1.8
Reviewed changes
Copilot reviewed 10 out of 11 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
python/ql/lib/change-notes/2026-02-05-fix-format-fill-character-misparse.md |
Documents the fix for format fill character parsing |
python/extractor/tsg-python/tsp/grammar.js |
Core fix: wraps : in format_specifier with token(prec(1, ...)) |
python/extractor/tsg-python/tsp/src/grammar.json |
Regenerated from grammar.js with the format_specifier change |
python/extractor/tsg-python/tsp/src/node-types.json |
Regenerated parser metadata |
python/extractor/tsg-python/tsp/src/tree_sitter/parser.h |
Updated tree-sitter runtime header |
python/extractor/tsg-python/tsp/src/tree_sitter/array.h |
Updated tree-sitter runtime header |
python/extractor/tests/parser/strings.py |
Added test case for f-string with = fill character |
python/extractor/tests/parser/template_strings_new.py |
Added test case for template string with format specifier |
python/extractor/tests/parser/template_strings_new.expected |
Regenerated expected output including new test |
python/extractor/semmle/util.py |
Version bump to 7.1.8 |
| if 6: | ||
| t"Implicit concatenation: " t"Hello, {name}!" t" How are you?" | ||
| if 7: | ||
| t"With a format specifier: {name:=^20}" |
Copilot
AI
Feb 5, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Syntax Error (in Python 3).
See below for a potential fix:
""
if 2:
f"Hello, {name}!"
if 3:
f"Value: {value:.2f}, Hex: {value:#x}"
if 4:
"Just a regular string."
if 5:
f"Multiple {first} and {second} placeholders."
if 6:
"Implicit concatenation: " f"Hello, {name}!" " How are you?"
if 7:
f"With a format specifier: {name:=^20}"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty funny -- the alert is based on the current analysis, which indeed has a syntax error here (because of the parser issue that this PR fixes).
redsun82
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for the quick fix!
I understand the tree_sitter C/C++ header changes may come from a tooling version bump. Might it make sense to mark those files as generated too?
Yeah, I really ought to exclude anything in I'll create a separate PR for this. |
An example (provided by @redsun82 from a report by @grahamcracker1234) is the string
f"{x:=^20}". Parsing this (with unnamed nodes shown) illustrates the problem:Observe that we've managed to combine the format specifier token
:and the fill character=in a single token (which doesn't match the:we expect in the grammar rule), and hence we get a syntax error.If we change the
=to some other character (e.g. a-), we instead getand in particular no syntax error.
To fix this, we want to ensure that the
:is lexed on its own, and thetoken(prec(1, ...))construction can be used to do exactly this.Finally, you may wonder why
=is special here. I think what's going on is that the lexer knows that:=is a token on its own (because it's used in the walrus operator), and so it greedily consumes the following=with this in mind.